14 research outputs found

    Empirical Methodology for Crowdsourcing Ground Truth

    Full text link
    The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa

    ENABLING MEDICAL EXPERT CRITIQUING USING A BDI APPROACH

    No full text
    Expert critiquing systems were introduced to assist physicians in decision making, without forcing them to comply to a gold standard of care. Critiquing systems do this by providing critique on a physician’s decisions, rather than telling him/her exactly what to do. In order to perform this task, a critiquing system must have knowledge of the diagnosis and the treatment processes, and must be able to link the actions preformed by a physician to this knowledge. The development of formal languages for describing medical guidelines (protocols) and the nationwide introduction of electronic patient records (EPR) in the Netherlands, facilitates the development of a new generation of medical critiquing systems. Essential to the success of the new generation critiquing systems is the ability to match the actions prescribed in a medical guideline to the physician’s actions reported in the EPR. Some authors have claimed that such a matching process is infeasible. This paper will show, however, that a BDI (beliefs, desires and intentions) approach enables a highly successful matching process thereby enabling expert critiquing based on an EPR.

    Evaluating Medical Lexical Simplification: Rule-Based vs. BERT

    No full text
    Lexical simplification (LS) can decrease the communication gap between medical experts and laypeople by replacing medical terms with layperson counterparts. In this paper, we present: 1) a rule-based approach to LS using a consumer health vocabulary, and 2) an unsupervised approach using BERT to generate word candidates. Human evaluation shows that the unsupervised model performed better for simplicity and grammaticality, while the rule-based method was better at meaning preservation

    Domain-Independent Quality Measures for Crowd Truth Disagreement

    No full text
    Abstract. Using crowdsourcing platforms such as CrowdFlower and Amazon Mechanical Turk for gathering human annotation data has become now a mainstream process. Such crowd involvement can reduce the time needed for solving an annotation task and with the large number of annotators can be a valuable source of annotation diversity. In order to harness this diversity across domains it is critical to establish a common ground for quality assessment of the results. In this paper we report our experiences for optimizing and adapting crowdsourcing microtasks across domains considering three aspects: (1) the micro-task template, (2) the quality measurements for the workers judgments and (3) the overall annotation work ow. We performed experiments in two domains, i.e. events extraction (MRP project) and medical relations extraction (Crowd-Watson project). The results con rm our main hypothesis that some aspects of the evaluation metrics can be de ned in a domainindependent way for micro-tasks that assess the parameters to harness the diversity of annotations and the useful disagreement between workers. This paper focuses speci cally on the parameters relevant for the 'event extraction ' ground-truth data collection and demonstrates their reusability from the medical domain

    Enabling protocol-based medical critiquing

    No full text
    Abstract. This paper investigates the combination of expert critiquing systems and formal medical protocols. Medical protocols might serve as a suitable basis for an expert critiquing system because of the ongoing acceptance of medical protocols and the rise of both evidence-based practice and evidence-based protocols.
    corecore